465 research outputs found
A latent variable ranking model for content-based retrieval
34th European Conference on IR Research, ECIR 2012, Barcelona, Spain, April 1-5, 2012. ProceedingsSince their introduction, ranking SVM models [11] have become a powerful tool for training content-based retrieval systems. All we need for training a model are retrieval examples in the form of triplet constraints, i.e. examples specifying that relative to some query, a database item a should be ranked higher than database item b. These types of constraints could be obtained from feedback of users of the retrieval system. Most previous ranking models learn either a global combination of elementary similarity functions or a combination defined with respect to a single database item. Instead, we propose a “coarse to fine” ranking model where given a query we first compute a distribution over “coarse” classes and then use the linear combination that has been optimized for queries of that class. These coarse classes are hidden and need to be induced by the training algorithm. We propose a latent variable ranking model that induces both the latent classes and the weights of the linear combination for each class from ranking triplets. Our experiments over two large image datasets and a text retrieval dataset show the advantages of our model over learning a global combination as well as a combination for each test point (i.e. transductive setting). Furthermore, compared to the transductive approach our model has a clear computational advantages since it does not need to be retrained for each test query.Spanish Ministry of Science and Innovation (JCI-2009-04240)EU PASCAL2 Network of Excellence (FP7-ICT-216886
Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics
Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Determination of step--edge barriers to interlayer transport from surface morphology during the initial stages of homoepitaxial growth
We use analytic formulae obtained from a simple model of crystal growth by
molecular--beam epitaxy to determine step--edge barriers to interlayer
transport. The method is based on information about the surface morphology at
the onset of nucleation on top of first--layer islands in the submonolayer
coverage regime of homoepitaxial growth. The formulae are tested using kinetic
Monte Carlo simulations of a solid--on--solid model and applied to estimate
step--edge barriers from scanning--tunneling microscopy data on initial stages
of Fe(001), Pt(111), and Ag(111) homoepitaxy.Comment: 4 pages, a Postscript file, uuencoded and compressed. Physical Review
B, Rapid Communications, in press
Pseudo Goldstone Bosons Phenomenology in Minimal Walking Technicolor
We construct the non-linear realized Lagrangian for the Goldstone Bosons
associated to the breaking pattern of SU(4) to SO(4). This pattern is expected
to occur in any Technicolor extension of the standard model featuring two Dirac
fermions transforming according to real representations of the underlying gauge
group. We concentrate on the Minimal Walking Technicolor quantum number
assignments with respect to the standard model symmetries. We demonstrate that
for, any choice of the quantum numbers, consistent with gauge and Witten
anomalies the spectrum of the pseudo Goldstone Bosons contains electrically
doubly charged states which can be discovered at the Large Hadron Collider.Comment: 25 pages, 5 figure
Racial discrimination and depressive symptoms among African-American men: The mediating and moderating roles of masculine self-reliance and John Henryism
Despite well-documented associations between everyday racial discrimination and depression, mechanisms underlying this association among African-American men are poorly understood. Guided by the Transactional Model of Stress and Coping, we frame masculine self-reliance and John Henryism as appraisal mechanisms that influence the relationship between racial discrimination, a source of significant psychosocial stress, and depressive symptoms among African-American men. We also investigate whether the proposed relationships vary by reported discrimination-specific coping responses. Participants were 478 African-American men recruited primarily from barbershops in the West and South regions of the United States. Multiple linear regression and Sobel-Goodman mediation analyses were used to examine direct and mediated associations between our study variables. Racial discrimination and masculine self-reliance were positively associated with depressive symptoms, though the latter only among active responders. John Henryism was negatively associated with depressive symptoms, mediated the masculine self-reliance- depressive symptom relationship, and among active responders moderated the racial discrimination-depressive symptoms relationship. Though structural interventions are essential, clinical interventions designed to mitigate the mental health consequences of racial discrimination among African-American men should leverage masculine self-reliance and active coping mechanisms
Classification of protein interaction sentences via gaussian processes
The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption
The Dynamics of a Rigid Body in Potential Flow with Circulation
We consider the motion of a two-dimensional body of arbitrary shape in a
planar irrotational, incompressible fluid with a given amount of circulation
around the body. We derive the equations of motion for this system by
performing symplectic reduction with respect to the group of volume-preserving
diffeomorphisms and obtain the relevant Poisson structures after a further
Poisson reduction with respect to the group of translations and rotations. In
this way, we recover the equations of motion given for this system by Chaplygin
and Lamb, and we give a geometric interpretation for the Kutta-Zhukowski force
as a curvature-related effect. In addition, we show that the motion of a rigid
body with circulation can be understood as a geodesic flow on a central
extension of the special Euclidian group SE(2), and we relate the cocycle in
the description of this central extension to a certain curvature tensor.Comment: 28 pages, 2 figures; v2: typos correcte
- …